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1  Introduction 

The  remarks  on  uncertainty  offered  here  were  stimulated  by  a  special  session 
at  the  SIAM  National  Meeting  in  Toronto  in  July,  1998.  In  particular,  presen¬ 
tations  by  Mac  Hyman  and  Linda  Petzold  prompted  the  simple  but  hopefully 
useful  observations  we  make.  The  general  topic  of  “quantifiying  uncertainty” 
and  then  minimizing  or  controlling  it  was  the  focus  of  a  SIAM/NSF  workshop 
in  Washington  in  June  1998  and  discussions  with  participants  of  that  workshop 
also  influenced  us  to  think  about  the  topic. 

Uncertainty  (recognized  or  not)  is  present  in  much  of  what  we  do  as  applied 
mathematicians.  It  arises  in  our  approximations  to  physical  and  biological  sys¬ 
tems  when  we  write  dynamical  or  static  equations  and  constraints.  Our  models 
may  be  compartmental  or  distributed  and  often  involve  reduction  in  order  and 
subsequent  discretization  either  at  the  finite  or  infinite  dimensional  level.  Once 
we  have  models  for  the  processes  of  interest,  we  often  introduce  computational 
uncertainty  through  forward  simulations,  inverse  or  parameter  estimation  pro¬ 
cedures,  and  sensitivity  and  robustness  analyses  (as  in  control  design). 

Most  often  our  efforts  do  not  include  discussions  of  these  uncertainties.  In¬ 
deed  there  is  a  general  absence  of  accepted  scientific  frameworks  for  quantifying 
and  controlling  uncertainties  from  underlying  sources  of  error  in  estimates,  com¬ 
putations  and  analysis. 

Seldom  are  we  able  to  treat  with  care  the  relationship  between  levels  of  un¬ 
certainty  and  the  nondeterministic  nature  of  currently  popular  “softer”  sciences 
involving  biological,  sociological,  economic  and  demographic  (especially  popula¬ 
tion  behavior)  studies.  There  are  an  overwhelming  number  of  related  questions. 
What  do  we  mean  by  a  “good  model”?  How  do  we  know  when  we  have  one? 
How  do  we  assess  accuracy  in  fitting  of  data  to  competing  models?  How  much 
weight  do  we  give  to  predictive  capability  of  models  versus  their  preservation  of 
certain  qualitative,  quantitative  or  asymptotic  behaviors?  In  regard  to  forward 
simulations  or  inverse  problem  estimates,  how  do  we  attach  “goodness”  or  va¬ 
lidity  to  numbers  that  we  produce  from  computations?  More  specifically,  how 
do  we  introduce  nondeterministic  features  in  our  models  and  calcualtions  such 
as  numerical  simulations  or  parameter  estimation  algorithms? 

Immediate  possibilities  come  to  mind  in  response  to  such  philosophical  cog¬ 
itations.  Among  the  more  traditional  approaches  are  the  following: 

1)  Use  of  stochastic  ordinary  or  partial  differential  equation  models.  These 
models  may  involve  simple  additive  “noise”  terms  and  random  initial  val¬ 
ues  or  they  may  include  a  much  more  comprehensive  treatment  of  stochas- 
ticity  in  coefficients  (parameterizations),  nonlinearities,  boundary  formu¬ 
lations,  constraints,  etc. 

2)  Use  of  “error  bars”  as  uncertainty  bounds  in  simulation  results.  These 
error  bars  may  be  in  recognition  of  modeling  error  including  parameteri¬ 
zation  error,  initial  data  error,  computational  (formula  and/or  machine) 
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error,  etc.  This  approach  could  encompass  the  reporting  of  computational 
results  in  a  manner  similar  to  that  used  by  experimentalists  with  data, 
requiring  scientists  to  discuss  the  variability  and  uncertainty  associated 
with  a  particular  set  of  numbers  generated  by  simulations. 

3)  Use  of  uncertainty  or  error  bars  in  parameter  estimates,  optimal  controls, 
sensitivity  and  robustness  bounds.  This  is  essentially  the  same  approach 
as  in  2)  extended  to  a  much  larger  range  of  computational  questions. 
For  example,  consider  a  parameter  estimation  problem  involving  a  best 
estimate  q*  to  be  chosen  from  a  set  Q  of  admissable  values  using  data  x.t 
for  x(ti)  where  x(t)  =  f(t,x(t),q).  As  an  alternative,  one  might  calculate 
not  q*,  but  a  range  of  values  in  [q  —  S,q  +  5]  where  S  embodies  probablistic 
information  and  q  approximates  q* . 

To  continue  with  the  rather  concrete  items  mentioned  above,  we  note  that 
while  l)-3)  above  each  include  stochasticity  or  uncertainty,  1)  versus  2), 3)  entail 
very  different  paradigms  for  inclusion  of  uncertainty.  The  stochastic  differen¬ 
tial  equation  approach  which  leads  to  a  stochastic  calculus,  for  example,  for 
equations  of  the  form  dx(t)  =  f(t,x(t),q)dt  +  dw(t),  has  resulted  over  the  past 
decades  in  a  tremendous  amount  of  mathematical  literature  and  numerous  the¬ 
oretical  tools.  However,  there  are  a  number  of  limitations  in  this  approach.  In 
particular,  one  generally  requires  very  specific  types  of  noise  (additive,  initial 
data,  white  noise,  etc.)  to  obtain  a  rigorous  theory.  Stochastic  parameters  (such 
as  rate  constants,  delays,  nonlinearities)  not  in  a  special  class  are  generally  not 
amenable  to  a  theoretical  treatment.  Overall  the  theoretical  frameworks  de¬ 
veloped  have  not  had  the  desired  impact  on  practical  aspects  of  uncertainty  in 
modeling  and  computation.  The  research  literature  (applied  as  well  as  theoret¬ 
ical)  also  contains  many  specialized  stochastic  differential  equation  simulation 
methods,  but  ensembles  of  solutions  are  often  computationally  expensive  for 
systems  of  interest  in  applications. 

With  respect  to  2)  and  3),  the  nearest  semblance  to  an  “error  bar”  theory 
presently  available  are  the  a  priori  error  bounds  that  can  be  derived  for  many 
numerical  approximation  schemes.  However,  these  a  priori  bounds  are  often 
not  available,  not  used  properly,  or  simply  not  very  helpful  in  addressing  ques¬ 
tions  related  to  uncertainty  in  numerical  computations.  As  we  shall  see  in  the 
next  section,  there  are  some  significant  mathematical  foundations  in  probability 
theory  that  can  be  used  in  inverse  problem  methods.  But  to  date  we  have  not 
made  very  good  use  of  them  to  develop  practical  uncertainty  assessment  tools. 

Happily,  there  is  a  “bad  news,  good  news”  scenario  in  all  of  this.  To  date, 
there  is  a  dearth  of  good  practical  tools  available  for  the  assessment  and  control 
of  uncertainty  in  the  computational  sciences.  The  good  news  resides  in  the 
potential  for  significant  contributions  in  such  a  wide  open  field  —  there  are 
reputations  (and  maybe  money  too!)  to  be  made  by  those  who  can  contribute 
even  simple,  sound  mathematical  theories  resulting  in  practical,  user  friendly 
tools. 
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In  the  next  section  we  will  outline  an  approach  to  inverse  or  parameter 
estimation  problems  that  relies  heavily  on  known  results  in  probability  theory. 
As  we  shall  see,  this  approach  allows  one  to  attach  quantitative  measures  of 
variability  to  estimated  parameters  by  viewing  them  as  random  variables. 


2  Estimation  of  Parameters  in  Dynamical  Sys¬ 
tems 


To  illustrate  our  ideas,  we  consider  here  the  estimation  of  constant  parameters 
in  a  system  of  ordinary  differential  equations.  The  ideas  are  readily  extended 
to  partial  differential  equations  with  unknown  functional  (e.g.,  spatially  and/or 
time  dependent  coefficients)  parameters  as  well  as  to  many  other  systems  of 
interest  in  applications. 

A  typical  estimation  problem  consists  of  using  observations  {x}/=1  for  x(t{), 
i  =  1, 2, . . . ,  n,  to  estimate  parameters  q  £  Rm  in  the  vector  dynamical  system 


x(t)  =  f(t,x(t),q).  (1) 

This  is  often  done  using  a  least  squares  formulation  to  find  a  best  parameter 
value  q*  in  some  admissible  parameter  set  Q  C  Rm.  Thus,  we  seek  to  find 
q*  £  Q  that  provides  a  minimum  for 


n 

Jfa)  =  X]  ~£i\2  (2) 

i= 1 

where  x(t\  q)  is  the  solution  of  (1)  for  a  given  q  £  Q. 

To  introduce  uncertainty  into  this  deterministic  process,  we  treat  the  pa¬ 
rameters  as  realizations  for  a  random  variable  and  use  the  data  to  estimate  the 
probability  distribution  function  (PDF)  for  this  random  variable.  More  precisely, 
let  V(Q)  denote  the  linear  space  of  probability  distributions  on  Q  and  treat  the 
data  {£i }  as  observations  for  the  expected  value 


S[x(U;q)\P\=  x(ti]  q)dP(q)  (3) 

Jq 

for  a  given  PDF  P  £  V{Q).  If  P  is  a  discrete  PDF  with  atoms  {qj}  C  Q  and 
probabilities  {pj},Pj  >  0 =  1)  then  (3)  can  be  written 


f  x{ti]  q)dP{q)  = 
Jq 


■{U\qj)p. 


J- 


(4) 


In  any  case,  the  least  squares  estimation  problem  becomes:  Find  P*  £  V{Q)  to 
minimize 
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(5) 


n 

J(p)  =  X  \£ 

»=1 

over  P  G  V(Q).  For  this  problem  to  be  tractable,  it  would  be  most  helpful  to 
have  a  topology  on  V(Q),  continuity  of  the  map  P  —>  J{P)  in  this  topology, 
compatible  compactness  results,  and  some  approximation  results  leading  to  im- 
plementable  computational  algorithms.  Fortunately,  probability  theory  offers  a 
great  start  toward  a  possible  complete  and  tractable  computational  methodol¬ 
ogy!  We  summarize  useful  results  from  Billingsly  [1]. 

We  begin  with  a  metric,  the  Prohorov  metric ,  for  V(Q),  the  set  of  probability 
measures  on  the  Borel  subsets  of  Q,  where  Q  can  be  any  complete  metric  space 
with  metric  d.  For  any  closed  subset  F  C  Q  and  e  >  0,  we  define  an  e— 
neighborhood  of  F  by 


Fe  =  {qeQ  | d{q,q)  <  e,q  G  F}  . 

We  then  define  p  :  V{Q)  x  V{Q)  — ►  R+  by 

p(Pi,  P2)  =  inf {e  >  0  |Pi[F]  <  P2[F£]  +  £,  F  closed  C  Q}  . 

The  following  results  are  well-known: 

(i)  p  is  a  metric  (called  the  Prohorov  metric)  on  V(Q) 

(ii)  ( V(Q),p )  is  a  complete  metric  space; 

(iii)  if  Q  is  compact,  then  ( V(Q),p )  is  a  compact  metric  space. 

To  develop  any  approximation  ideas,  we  need  to  understand  the  meaning  of 
the  convergence  Pfc  — >  P  in  the  p  metric.  The  definition  of  the  Prohorov  metric 
is  neither  intuitive  nor  easily  used  directly.  Again,  we  are  fortunate  to  have  the 
following  theorem  on  equivalent  formulations. 

Theorem  1  Suppose  ( Q,d )  is  a  complete  metric  space  and  (P(Q),  p)  is  defined 
as  above.  Then  for  Pk,  P  m  V(Q),  the  following  are  equivalent: 

(i.)  p(Pfc,P)  -t  0; 

(ii.)  Jq  fdPk(q)  — >  Jq  fdP(q)  for  all  bounded,  uniformly 
continuous  f  :  Q  — >  F1 ; 

(iii.)  P&[A]  — >  P[A \  for  all  Borel  sets  A  C  Q  with  P[dA]  =  0,  where  dA  denotes 
the  boundary  of  A. 
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Thus  we  see  that  convergence  in  the  p  metric  is  equivalent  to  convergence 
in  distribution.  Or,  if  we  consider  V(Q)  C  Cb(Q)* ■  where  Cb(Q )  denotes 
the  space  of  bounded,  continuous  functions  on  Q  with  supremum  norm,  then 
convergence  in  the  p  topology  is  equivalent  to  weak*  convergence  in  P(Q). 
More  importantly  to  us  for  our  discussions  here,  p(Pk,P)  — »  0  is  equivalent  to 


[  x{t’,q)dPk(q) ->  [ 
Jo  Jo 


x(t]  q)dP(q), 


or,  using  standard  notation  from  probability  theory, 


£[x(t;q)  | Pk]  -t  £[x{t;q)\P]. 

This  is  precisely  the  convergence  one  needs  to  establish  continuity  in  the  p 
topology  of  the  map 


n 

P^J(P)  =  J2  \£Wi\q)\P}~  ■ 

i=  1 

Continuity  of  this  map  along  with  the  compactness  of  Q,  which  in  turn  guaran¬ 
tees  compactness  of  V{Q)  in  the  p  metric,  is  then  sufficient  to  establish  existence 
of  solutions  to  the  problem  of  minimizing  J  of  (5)  over  V{Q). 

Assuming  that  existence  questions  are  dealt  with,  we  note  that  V{Q)  with 
the  p  metric  is  generally  an  infinite  dimensional  space.  Thus,  to  treat  computa¬ 
tional  issues  one  must  consider  approximations  ideas.  Once  again,  probability 
theory  provides  the  needed  results. 


Theorem  2  Suppose  that  Q  is  a  complete  separable  metric  space  and  let  Qr  = 
i  be  a  countable  dense  subset  of  Q.  Then  the  set  of  P  £  P{Q)  such  that 
P  has  finite  support  in  Qr  is  dense  in  V{Q)  in  the  p  metric.  That  is, 


Vr{Q)  =  {p  =  Ej=iPAj  Pj  >  0,Ej=iPj  =  M  e  N+,qj  e  Qr} 


is  dense  in  V(Q).  Here  5qj  is  the  Dirac  measure  with  atom  at  qj  and  N+  are 
the  positive  integers. 


For  a  given  positive  integer  M  and  Qr  as  in  the  theorem,  define 


VM  =  {PeVr(Q)  F-E^ipAA 


(6) 


Then  VM  is  a  compact  subset  of  V(Q)  with  PM  C  VM+1  and  U %=1VM  = 
Vr{Q).  Moreover,  the  denseness  of  Pr(Q)  in  P(Q)  allows  us  to  approximate  any 
element  P  £  V{Q)  by  a  sequence  {Pm},  Pm  &  VM ,  such  that  p(Pm,  P)  — >  0  as 
M  oo. 
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To  define  a  family  of  approximate  minimization  problems,  we  fix  {qi  ,q2,...,  Qm} 
in  Qr  with  associated  VM  defined  by  (6).  Then  for  Pm  =  Ejii Pj^q,  £  VM , 
the  minimization  criterion  (5)  reduces  to 

J{Pm)  =  E"=i  Ejli  X(U;  qj)pj  -  Xi  (7) 

and  our  approximate  estimation  problem  becomes  one  of  minimizing  J(Pm)  of 
(7)  over  Pm  £  VM ■  We  observe  that  this  problem  is  computationally  most 
attractive.  Indeed  it  is  a  constrained  quadratic  programming  problem.  Letting 
p  =  (pi, . . .  ,pm)  £  RM  with  pj  >  0,  Ei* Pj  =  1;  we  see  that  minimizing  (7)  is 
equivalent  to  minimizing 

J{p)  =  p  •  Ap  + 2p -b  +  c  (8) 

where 

n 

Akj  =  ^2x(ti\qk)x(ti\qj)  fc,  j  =  1, 2, . . . ,  M 

i=l 

n 

bj  =  -Y,xix{ti]qj)  j  =  l,  2,  ...,M 

c  =  £>?■ 

i=  1 

That  is,  we  are  reduced  to  seeking  a  minimizer  p*  for  J (p)  subject  to  p  ■  lj  > 

0,  j  =  1, 2, . . . ,  M,  and  p  ■  1  =  1.  Here  lj  is  the  M  vector  with  zero  components 
except  for  a  1  in  the  jth  component  and  1  =  (1, 1, . . . ,  1)  6  RM . 

For  this  constrained  optimization  problem  there  are  many  suitable  tech¬ 
niques,  e.g.,  Lagrange  multiplier  methods,  etc.  Indeed,  MATLAB  has  routines 
that  render  this  problem  quickly  solvable  and  we  shall  illustrate  this  by  example 
below. 

We  remark  that  any  solution  p*  must  satisfy 

Ap*  =  -b 

and  if  A  is  nonsingular,  p*  is  given  uniquely  and  depends  continuously  on  —b 
and  hence  continuously  on  {aq}f_  1.  This  is  precisely  the  requirements  for  the 
finite  dimensional  problems  to  each  be  well-posed  or  stable  in  inverse  problem 
terminology  (i.e. ,  existence,  uniqueness,  and  continuous  dependence  on  data  for 
P*)- 

3  Example 

We  present  a  very  simple  example  to  illustrate  a  computational  algorithm  result¬ 
ing  from  theoretical  discussions  of  the  previous  section.  The  particular  system 
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we  use  is  motivated  by  a  problem  in  the  assessment  of  the  efficiency  of  a  vacci¬ 
nation  program  [2,  3]  by  using  data  {xi}  for  the  aggregate  population  x(tt)  of 
vaccinated  but  not  infected  individuals  at  time  tt.  The  evolution  of  population 
is  given  by 

x(t )  =  —qG(t)x(t),  a;(0)  =  £o,  (9) 

where  G(t)  represents  the  known  rate  of  exposure  to  infection,  q  is  the  suscepti¬ 
bility  to  “environmental  exposure”  subsequent  to  vaccination  of  the  population 
at  time  t  =  0,  and  xq  is  the  known  number  of  individuals  initially  vaccinated. 
The  best  susceptibility  parameter  q  is  to  be  chosen  from  Q  =  [0, 1]  and  in  the 
spirit  of  the  presentation  of  the  previous  section,  we  seek  to  identify  a  proba¬ 
bility  distribution  P(q)  €  P(Q)  using  the  data  {xt}  as  the  expected  number  of 
vaccinated  but  not  yet  infected  individuals  for  a  given  distribution. 

All  calculations  were  carried  out  using  MATLAB  routines.  “Data”  {xi}  was 
prepared  by  solving  (9)  with  a  specific  value  of  q  =  q*  (=  .5  in  the  example 
presented  here)  and  G(t)  given  by 

(  0  0  <  t  <  .195 

G(t)  =  <  100(t  -  .195)  .195  <  t  <  .205 
[  1  .205  <  t. 

Relative  random  noise  was  added  to  these  solutions  so  that  the  “data”  was  given 
by  Xi  =  x(tt:  q*)[l  +£-,]  where  the  e.t  are  independent  Gaussian  random  variables 
with  mean  zero  and  variance  a 2  (cr  =  .005  was  used  in  the  computations  detailed 
below). 

To  discretize  Q  in  formulating  the  approximation  class  given  in  (6),  for 
a  given  M  we  used  a  uniform  partition  of  Q  defined  by  {gi,  •  •  • ,  <Zm}  where 
qi  =  0,  qm  =  1  and  qj+i  —  qj  =  M1_1  ■  The  resulting  constrained  problem  using 
(8)  was  solved  using  the  qp  (constrained  quadratic  programming)  algorithm  in 
MATLAB.  Since  the  system  (9)  was  particularly  simple  in  this  case,  the  values 
x(t,L\  q*)  were  obtained  analytically. 

We  present  in  the  figures  below  the  estimated  discrete  probability  densities 
represented  by  p  =  (pi , . . .  ,pm)  and  the  corresponding  discontinuous  distribu¬ 
tions  Pm  =  Ylp=i  Pj^Qj  f°r  several  values  of  M.  We  note  that  in  general  we 
observe  convergence  of  the  probablity  distributions  in  the  Prohorov  metric  (as 
guaranteed  by  the  theory  of  the  previous  section) ,  while  the  discrete  densities  do 
not  in  general  converge  in  any  meaningful  sense  on  Q  =  [0, 1].  In  this  example 
the  true  probability  distribution  P*  is  given  by  P*[A]  =  0  if  .5  ^  A.  P*[A]  =  1 
if  .5  £  A.  The  computational  algorithm  performs  similarly  in  cases  where  the 
underlying  distribution  is  a  continuous  distribution  with  a  corresponding  prob¬ 
ability  density  function  (e.g.,  see  [2]). 
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4  Concluding  Remarks 

The  probablistic  ideas  presented  in  Section  2  above  have  been  used  in  a  fun¬ 
damental  way  in  a  diverse  number  of  applications.  They  are  the  foundation 
of  approximation  ideas  developed  in  the  1950’s-1960’s  in  control  theory  (e.g., 
chattering  controls,  Young’s  measures,  sliding  regimes,  relaxed  curves,  etc.  in 
the  classical  works  of  L.C.  Young,  E.J.  McShane,  A.F.  Filippov  and  J.  Warga 
between  1937  and  1970  —  see  [4]  for  discussions  and  references)  and  in  the  the¬ 
ory  of  effective  material  moduli  in  composite  materials  in  the  1970’s  -  1980’s. 
More  recently  they  have  been  used  in  estimation  of  distributed  growth  rates 
from  aggregate  population  data  [5,  6,  7]  and  in  an  application  to  the  Preisach 
theory  of  hysteresis  in  input  operators  for  shape  memory  alloys  [8,  9]. 

The  considerations  in  the  previous  sections  (which  are  simple  initial  remarks) 
suggest  the  possiblity  of  a  useful  reformulation  of  many  of  the  concepts  and 
approaches  in  inverse  problem  methodology,  both  theory  and  computation. 

There  are  a  number  of  other  relevant  important  topics  that,  due  to  lack  of 
time,  we  will  not  mention  in  detail  here.  Included  are  model  approximation 
and  order  reduction  and  the  question  of  “goodness”  of  the  reduced  order  model. 
The  popular  Karhunen  -  Loeve  or  proper  orthogonal  decomposition  (POD) 
techniques  offer  great  promise  here.  These  model  reduction  techniques  have  been 
shown  to  be  quite  useful  in  both  open  loop  and  feedback  or  closed  loop  control 
computational  methodologies  (see  [10,  11]  and  the  references  therein)  and  are 
currently  being  tested  in  inverse  problem  methodologies.  The  POD  formulation 
itself  is  of  interest  since  it  inherently  contains  techniques  for  computing  the 
efficiency  of  the  reduced  order  model  in  capturing  the  dynamic  energy  present 
in  the  data  or  large  scale  model  used  to  generate  “snapshots”.  It  thus  has 
potential  for  use  in  a  “goodness  of  model”  framework. 

In  inverse  problem  methodology,  there  are  least  squares  probablistic  based 
statistical  tests  for  comparing  the  fits  of  different  models  to  data  wherever  the 
models  are  special  cases  of  a  larger  model  (see  [12]  and  in  particular  Chapter  V.5- 
7  of  [13]).  The  underlying  ideas  can  be  exploited  to  develop  a  type  of  uncertainty 
measure  in  inverse  problem  techniques. 
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